Search CORE

56 research outputs found

Fast Fuzzy Inference in Octave

Author: Corrado Mencar
Gianvito Pio
Piero Molino
Publication venue
Publication date: 01/01/2013
Field of study

Fuzzy relations are simple mathematical structures that enable a very general representation of fuzzy knowledge, and fuzzy relational calculus offers a powerful machinery for approximate reasoning. However, one of the most relevant limitations of approximate reasoning is the efficiency bottleneck. In this paper, we present two implementations for fast fuzzy inference through relational composition, with the twofold objective of being general and efficient. The two implementations are capable of working on full and sparse representations respectively. Further, a wrapper procedure is capable of automatically selecting the best implementation on the basis of the input features. We implemented the code in GNU Octave because it is a high-level language targeted to numerical computations. Experimental results show the impressive performance gain when the proposed implementation is used

Directory of Open Access Journals

Archivio istituzionale della ricerca - Università di Bari

Open Access Repository

PRILJ: an efficient two-step method based on embedding and clustering for the identification of regularities in legal case judgments

Author: Gianvito Pio
Graziella De Martino
Michelangelo Ceci
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

In an era characterized by fast technological progress that introduces new unpredictable scenarios every day, working in the law field may appear very difficult, if not supported by the right tools. In this respect, some systems based on Artificial Intelligence methods have been proposed in the literature, to support several tasks in the legal sector. Following this line of research, in this paper we propose a novel method, called PRILJ, that identifies paragraph regularities in legal case judgments, to support legal experts during the redaction of legal documents. Methodologically, PRILJ adopts a two-step approach that first groups documents into clusters, according to their semantic content, and then identifies regularities in the paragraphs for each cluster. Embedding-based methods are adopted to properly represent documents and paragraphs into a semantic numerical feature space, and an Approximated Nearest Neighbor Search method is adopted to efficiently retrieve the most similar paragraphs with respect to the paragraphs of a document under preparation. Our extensive experimental evaluation, performed on a real-world dataset provided by EUR-Lex, proves the effectiveness and the efficiency of the proposed method. In particular, its ability of modeling different topics of legal documents, as well as of capturing the semantics of the textual content, appear very beneficial for the considered task, and make PRILJ very robust to the possible presence of noise in the data

Archivio istituzionale della ricerca - Università di Bari

DENCAST: distributed density-based clustering for multi-target regression

Author: Donato Malerba
Gianvito Pio
Michelangelo Ceci
Roberto Corizzo
Publication venue
Publication date: 03/06/2019
Field of study

Recent developments in sensor networks and mobile computing led to a huge increase in data generated that need to be processed and analyzed efficiently. In this context, many distributed data mining algorithms have recently been proposed. Following this line of research, we propose the DENCAST system, a novel distributed algorithm implemented in Apache Spark, which performs density-based clustering and exploits the identified clusters to solve both single- and multi-target regression tasks (and thus, solves complex tasks such as time series prediction). Contrary to existing distributed methods, DENCAST does not require a final merging step (usually performed on a single machine) and is able to handle large-scale, high-dimensional data by taking advantage of locality sensitive hashing. Experiments show that DENCAST performs clustering more efficiently than a state-of-the-art distributed clustering algorithm, especially when the number of objects increases significantly. The quality of the extracted clusters is confirmed by the predictive capabilities of DENCAST on several datasets: It is able to significantly outperform (p-value

<0.05

) state-of-the-art distributed regression methods, in both single and multi-target settings

Open Access Repository

Advancing microbiome research with machine learning : key findings from the ML4Microbiome COST action

The rapid development of machine learning (ML) techniques has opened up the data-dense field of microbiome research for novel therapeutic, diagnostic, and prognostic applications targeting a wide range of disorders, which could substantially improve healthcare practices in the era of precision medicine. However, several challenges must be addressed to exploit the benefits of ML in this field fully. In particular, there is a need to establish "gold standard" protocols for conducting ML analysis experiments and improve interactions between microbiome researchers and ML experts. The Machine Learning Techniques in Human Microbiome Studies (ML4Microbiome) COST Action CA18131 is a European network established in 2019 to promote collaboration between discovery-oriented microbiome researchers and data-driven ML experts to optimize and standardize ML approaches for microbiome analysis. This perspective paper presents the key achievements of ML4Microbiome, which include identifying predictive and discriminatory 'omics' features, improving repeatability and comparability, developing automation procedures, and defining priority areas for the novel development of ML methods targeting the microbiome. The insights gained from ML4Microbiome will help to maximize the potential of ML in microbiome research and pave the way for new and improved healthcare practices

Utrecht University Repository

Contemporary Challenges and Solutions

CA18131 CP16/00163 NIS-3317 NIS-3318 decision 295741 C18/BM/12585940The human microbiome has emerged as a central research topic in human biology and biomedicine. Current microbiome studies generate high-throughput omics data across different body sites, populations, and life stages. Many of the challenges in microbiome research are similar to other high-throughput studies, the quantitative analyses need to address the heterogeneity of data, specific statistical properties, and the remarkable variation in microbiome composition across individuals and body sites. This has led to a broad spectrum of statistical and machine learning challenges that range from study design, data processing, and standardization to analysis, modeling, cross-study comparison, prediction, data science ecosystems, and reproducible reporting. Nevertheless, although many statistics and machine learning approaches and tools have been developed, new techniques are needed to deal with emerging applications and the vast heterogeneity of microbiome data. We review and discuss emerging applications of statistical and machine learning techniques in human microbiome studies and introduce the COST Action CA18131 “ML4Microbiome” that brings together microbiome researchers and machine learning experts to address current challenges such as standardization of analysis pipelines for reproducibility of data analysis results, benchmarking, improvement, or development of existing and new tools and ontologies.publishersversionpublishe

University of Bergen

Repositório da Universidade Nova de Lisboa

EUR Research Repository

Cork Open Research Archive

NORA - Norwegian Open Research Archives

Open Repository and Bibliography - Luxembourg

Utrecht University Repository

Erciyes University - AVESIS

Riga Stradins university

Fondo Bibliográfico Digital Institucional

Statistical and Machine Learning Techniques in Human Microbiome Studies: Contemporary Challenges and Solutions

The human microbiome has emerged as a central research topic in human biology and biomedicine. Current microbiome studies generate high-throughput omics data across different body sites, populations, and life stages. Many of the challenges in microbiome research are similar to other high-throughput studies, the quantitative analyses need to address the heterogeneity of data, specific statistical properties, and the remarkable variation in microbiome composition across individuals and body sites. This has led to a broad spectrum of statistical and machine learning challenges that range from study design, data processing, and standardization to analysis, modeling, cross-study comparison, prediction, data science ecosystems, and reproducible reporting. Nevertheless, although many statistics and machine learning approaches and tools have been developed, new techniques are needed to deal with emerging applications and the vast heterogeneity of microbiome data. We review and discuss emerging applications of statistical and machine learning techniques in human microbiome studies and introduce the COST Action CA18131 "ML4Microbiome" that brings together microbiome researchers and machine learning experts to address current challenges such as standardization of analysis pipelines for reproducibility of data analysis results, benchmarking, improvement, or development of existing and new tools and ontologies

UTUPub

Integrating microRNA target predictions for the discovery of gene regulatory networks: a semi-supervised ensemble learning approach

Author: Domenica D’Elia
Donato Malerba
Gianvito Pio
Michelangelo Ceci
Pio Gianvito
Publication venue: Chapman and Hall
Publication date: 01/01/2014
Field of study

Background MicroRNAs (miRNAs) are small non-coding RNAs which play a key role in the post-transcriptional regulation of many genes. Elucidating miRNA-regulated gene networks is crucial for the understanding of mechanisms and functions of miRNAs in many biological processes, such as cell proliferation, development, differentiation and cell homeostasis, as well as in many types of human tumors. To this aim, we have recently presented the biclustering method HOCCLUS2, for the discovery of miRNA regulatory networks. Experiments on predicted interactions revealed that the statistical and biological consistency of the obtained networks is negatively affected by the poor reliability of the output of miRNA target prediction algorithms. Recently, some learning approaches have been proposed to learn to combine the outputs of distinct prediction algorithms and improve their accuracy. However, the application of classical supervised learning algorithms presents two challenges: i) the presence of only positive examples in datasets of experimentally verified interactions and ii) unbalanced number of labeled and unlabeled examples. Results We present a learning algorithm that learns to combine the score returned by several prediction algorithms, by exploiting information conveyed by (only positively labeled/) validated and unlabeled examples of interactions. To face the two related challenges, we resort to a semi-supervised ensemble learning setting. Results obtained using miRTarBase as the set of labeled (positive) interactions and mirDIP as the set of unlabeled interactions show a significant improvement, over competitive approaches, in the quality of the predictions. This solution also improves the effectiveness of HOCCLUS2 in discovering biologically realistic miRNA:mRNA regulatory networks from large-scale prediction data. Using the miR-17-92 gene cluster family as a reference system and comparing results with previous experiments, we find a large increase in the number of significantly enriched biclusters in pathways, consistent with miR-17-92 functions. Conclusion The proposed approach proves to be fundamental for the computational discovery of miRNA regulatory networks from large-scale predictions. This paves the way to the systematic application of HOCCLUS2 for a comprehensive reconstruction of all the possible multiple interactions established by miRNAs in regulating the expression of gene networks, which would be otherwise impossible to reconstruct by considering only experimentally validated interactions

Crossref

Springer

Springer - Publisher Connector

Archivio istituzionale della ricerca - Università di Bari

PubMed Central

Full Results

Author: Gianvito Pio (827525)
Publication venue
Publication date: 15/11/2022
Field of study

Full results obtained by all the considered ML methods at national regional levels. </p

FigShare

Code

Author: Gianvito Pio (827525)
Publication venue
Publication date: 15/11/2022
Field of study

Code adopted to run the experiments</p

FigShare

Semi-supervised Multi-View Learning for Gene Network Reconstruction

Author: Gianvito Pio (827525)
Publication venue
Publication date
Field of study

Semi-supervised Multi-View Learning for Gene Network Reconstruction SynTReN Data: E.coli and Yeast sub-networks, generated expression data and gold standards (Input_Datasets.zip) Interactions predicted by base methods (Base_Method_Predictions.zip) Interactions predicted by our approach - Clustering performed with PCA (Predictions.zip) Interactions predicted by our approach - Clustering performed with K-means (PredictionsK.zip) Dream5 Data: Expression data and gold standards provided by Marbach et al. 2012 [1] (Input_Datasets_D5.zip) Interactions predicted by the considered DREAM5 base methods provided by Marbach et al. 2012 [1] (Base_Method_Predictions_D5.zip) Interactions predicted by our approach - Clustering performed with PCA (Predictions_D5.zip) Interactions predicted by our approach - Clustering performed with K-means (PredictionsK_D5.zip) [1] Marbach, D., Costello, J. C., Kuffner, R., Vega, N. M., Prill, R. J., Camacho, D. M., Allison, K. R., Kellis, M., Collins, J. J., and Stolovitzky, G., Wisdom of crowds for robust gene network inference, Nature Methods, 9, 796-804, 2012.</p

FigShare